The packages used in this tutorial can be installed with the following commands:
install.packages("tidyverse")
install.packages("sf")
install.packages("raster")
install.packages("spData")
install.packages("spDataLarge")The packages we will require for this section of the workshop are:
library("tidyverse")
library("sf")
library("raster")
library("spData")
library("spDataLarge")Vector and raster spatial data types share concepts intrinsic to spatial data. Perhaps the most fundamental of these is the Coordinate Reference System (CRS), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies). CRSs are either geographic or projected.
Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude. Longitude is location in the East-West direction in angular distance from the Prime Meridian plane. Latitude is angular distance North or South of the equatorial plane. Distances in geographic CRSs are therefore not measured in meters. This has important consequences.
The surface of the Earth in geographic coordinate systems is represented by a spherical or ellipsoidal surface. Spherical models assume that the Earth is a perfect sphere of a given radius. Spherical models have the advantage of simplicity but are rarely used because they are inaccurate: the Earth is not a sphere! Ellipsoidal models are defined by two parameters: the equatorial radius and the polar radius. These are suitable because the Earth is compressed: the equatorial radius is around 11.5 km longer than the polar radius [@maling_coordinate_1992].1
Ellipsoids are part of a wider component of CRSs: the datum. This contains information on what ellipsoid to use (with the ellps parameter in the PROJ CRS library) and the precise relationship between the Cartesian coordinates and location on the Earth’s surface. These additional details are stored in the towgs84 argument of proj4string notation (see proj4.org/parameters.html for details). These allow local variations in Earth’s surface, for example due to large mountain ranges, to be accounted for in a local CRS. There are two types of datum — local and geocentric. In a local datum such as NAD83 the ellipsoidal surface is shifted to align with the surface at a particular location. In a geocentric datum such as WGS84 the center is the Earth’s center of gravity and the accuracy of projections is not optimized for a specific location. Available datum definitions can be seen by executing st_proj_info(type = "datum").
Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters. All projected CRSs are based on a geographic CRS, described in the previous section, and rely on map projections to convert the three-dimensional surface of the Earth into Easting and Northing (x and y) values in a projected CRS.
This transition cannot be done without adding some distortion. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties. Projections are often named based on a property they preserve: equal-area preserves area, azimuthal preserve direction, equidistant preserve distance, and conformal preserve local shape.
There are three main groups of projection types - conic, cylindrical, and planar. In a conic projection, the Earth’s surface is projected onto a cone along a single line of tangency or two lines of tangency. Distortions are minimized along the tangency lines and rise with the distance from those lines in this projection. Therefore, it is the best suited for maps of mid-latitude areas. A cylindrical projection maps the surface onto a cylinder. This projection could also be created by touching the Earth’s surface along a single line of tangency or two lines of tangency. Cylindrical projections are used most often when mapping the entire world. A planar projection projects data onto a flat surface touching the globe at a point or along a line of tangency. It is typically used in mapping polar regions. st_proj_info(type = "proj") gives a list of the available projections supported by the PROJ library.
Two main ways to describe CRS in R are an epsg code or a proj4string definition. Both of these approaches have advantages and disadvantages. An epsg code is usually shorter, and therefore easier to remember. The code also refers to only one, well-defined coordinate reference system. On the other hand, a proj4string definition allows you more flexibility when it comes to specifying different parameters such as the projection type, the datum and the ellipsoid.2 This way you can specify many different projections, and modify existing ones. This also makes the proj4string approach more complicated. epsg points to exactly one particular CRS.
Spatial R packages support a wide range of CRSs and they use the long-established PROJ library. Other than searching for EPSG codes online, another quick way to find out about available CRSs is via the rgdal::make_EPSG() function, which outputs a data frame of available projections. Before going into more detail, it’s worth learning how to view and filter them inside R, as this could save time trawling the internet. The following code will show available CRSs interactively, allowing you to filter ones of interest (try filtering for the OSGB CRSs for example):
crs_data = rgdal::make_EPSG()
View(crs_data)In sf the CRS of an object can be retrieved using st_crs(). For this, we need to read-in a vector dataset:
vector_filepath = system.file("vector/zion.gpkg", package = "spDataLarge")
new_vector = st_read(vector_filepath)Our new object, new_vector, is a polygon representing the borders of Zion National Park (?zion).
st_crs(new_vector) # get CRS
#> Coordinate Reference System:
#> No EPSG code
#> proj4string: "+proj=utm +zone=12 +ellps=GRS80 ... +units=m +no_defs"In cases when a coordinate reference system (CRS) is missing or the wrong CRS is set, the st_set_crs() function can be used:
new_vector = st_set_crs(new_vector, 4326) # set CRS## Warning: st_crs<- : replacing crs does not reproject data; use st_transform
## for that
The warning message informs us that the st_set_crs() function does not transform data from one CRS to another.
Examples of geographic (WGS 84; left) and projected (NAD83 / UTM zone 12N; right) coordinate systems for a vector data type.
The projection() function can be used to access CRS information from a Raster* object:
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
new_raster = raster(raster_filepath)
projection(new_raster) # get CRS## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
The same function, projection(), is used to set a CRS for raster objects. The main difference, compared to vector data, is that raster objects only accept proj4 definitions:
projection(new_raster) = "+proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0
+units=m +no_defs" # set CRSExamples of geographic (WGS 84; left) and projected (NAD83 / UTM zone 12N; right) coordinate systems for raster data.
We will expand on CRSs and how to project from one CRS to another in the following sections.
In some cases the CRS is unknown, as shown below using London as an example:
london = data.frame(lon = -0.1, lat = 51.5) %>%
st_as_sf(coords = c("lon", "lat"))
st_is_longlat(london)## [1] NA
This shows that unless a CRS is manually specified or is loaded from a source that has CRS metadata, the CRS is NA. A CRS can be added to sf objects with st_set_crs() as follows:
london_geo = st_set_crs(london, 4326)
st_is_longlat(london_geo)## [1] TRUE
The CRS can also be added when creating sf objects with the crs argument (e.g., st_sf(geometry = st_sfc(st_point(c(-0.1, 51.5))), crs = 4326)). The same argument can also be used to set the CRS when creating raster datasets (e.g., raster(crs = "+proj=longlat")).
Datasets without a specified CRS can cause problems. An example is provided below, which creates a buffer of one unit around london and london_geo objects:
london_buff_no_crs = st_buffer(london, dist = 1)
london_buff = st_buffer(london_geo, dist = 1)## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
Only the second operation generates a warning. The warning message is useful, telling us that the result may be of limited use because it is in units of latitude and longitude, rather than meters or some other suitable measure of distance assumed by st_buffer(). The consequences of a failure to work on projected data are illustrated in the London example (left panel): the buffer is elongated in the north-south direction because lines of longitude converge towards the Earth’s poles.
geosphere::distGeo(c(0, 0), c(1, 0)) to find the precise distance). This shrinks to zero at the poles. At the latitude of London, for example, meridians are less than 70 km apart (challenge: execute code that verifies this). Lines of latitude, by contrast, have constant distance from each other irrespective of latitude: they are always around 111 km apart, including at the equator and near the poles.
Do not interpret the warning about the geographic (longitude/latitude) CRS as “the CRS should not be set”: it almost always should be! It is better understood as a suggestion to reproject the data onto a projected CRS. This suggestion does not always need to be heeded: performing spatial and geometric operations makes little or no difference in some cases (e.g., spatial subsetting). But for operations involving distances such as buffering, the only way to ensure a good result is to create a projected copy of the data and run the operation on that. This is done in the code chunk below:
london_proj = data.frame(x = 530000, y = 180000) %>%
st_as_sf(coords = 1:2, crs = 27700)The result is a new object that is identical to london, but reprojected onto a suitable CRS (the British National Grid, which has an EPSG code of 27700 in this case) that has units of meters. We can verify that the CRS has changed using st_crs() as follows (some of the output has been replaced by ...):
st_crs(london_proj)
#> Coordinate Reference System:
#> EPSG: 27700
#> proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 ... +units=m +no_defs"Notable components of this CRS description include the EPSG code (EPSG: 27700), the projection (transverse Mercator, +proj=tmerc), the origin (+lat_0=49 +lon_0=-2) and units (+units=m).3 The fact that the units of the CRS are meters (rather than degrees) tells us that this is a projected CRS: st_is_longlat(london_proj) now returns FALSE and geometry operations on london_proj will work without a warning, meaning buffers can be produced from it using proper units of distance.
As pointed out above, moving one degree means moving a bit more than 111 km at the equator (to be precise: 111,320 meters). This is used as the new buffer distance:
london_proj_buff = st_buffer(london_proj, 111320)The result in the figure below (right panel) shows that buffers based on a projected CRS are not distorted: every part of the buffer’s border is equidistant to London.
Buffers around London with a geographic (left) and projected (right) CRS. The gray outline represents the UK coastline.
The importance of CRSs (primarily whether they are projected or geographic) has been demonstrated using the example of London. The subsequent sections go into more depth, exploring which CRS to use and the details of reprojecting vector and raster objects.
The previous section showed how to set the CRS manually, with st_set_crs(london, 4326). In real world applications, however, CRSs are usually set automatically when data is read-in. The main task involving CRSs is often to transform objects, from one CRS into another. But when should data be transformed? And into which CRS? There are no clear-cut answers to these questions and CRS selection always involves trade-offs [@maling_coordinate_1992]. However, there are some general principles provided in this section that can help you decide.
First it’s worth considering when to transform. In some cases transformation to a projected CRS is essential, such as when using geometric functions such as st_buffer(), as shown by the figure above. Conversely, publishing data online with the leaflet package may require a geographic CRS. Another case is when two objects with different CRSs must be compared or combined, as shown when we try to find the distance between two objects with different CRSs:
st_distance(london_geo, london_proj)
# > Error: st_crs(x) == st_crs(y) is not TRUETo make the london and london_proj objects geographically comparable one of them must be transformed into the CRS of the other. But which CRS to use? The answer is usually ‘to the projected CRS’, which in this case is the British National Grid (EPSG:27700):
london2 = st_transform(london_geo, 27700)Now that a transformed version of london has been created, using the sf function st_transform(), the distance between the two representations of London can be found. It may come as a surprise that london and london2 are just over 2 km apart!4
st_distance(london2, london_proj)## Units: [m]
## [,1]
## [1,] 2017.95
We have seen how vector geometries are made-up of points, and how points form the basis of more complex objects such as lines and polygons. Reprojecting vectors thus consists of transforming the coordinates of these points. This is illustrated by cycle_hire_osm, an sf object from spData that represents cycle hire locations across London. The previous section showed how the CRS of vector data can be queried with st_crs(). Although the output of this function is printed as a single entity, the result is in fact a named list of class crs, with names proj4string (which contains full details of the CRS) and epsg for its code. This is demonstrated below:
crs_lnd = st_crs(cycle_hire_osm)
class(crs_lnd)## [1] "crs"
crs_lnd$epsg## [1] 4326
This duality of CRS objects means that they can be set either using an EPSG code or a proj4string. This means that st_crs("+proj=longlat +datum=WGS84 +no_defs") is equivalent to st_crs(4326), although not all proj4strings have an associated EPSG code. Both elements of the CRS are changed by transforming the object to a projected CRS:
cycle_hire_osm_projected = st_transform(cycle_hire_osm, 27700)The resulting object has a new CRS with an EPSG code 27700. But how to find out more details about this EPSG code, or any code? One option is to search for it online. Another option is to use a function from the rgdal package to find the name of the CRS:
crs_codes = rgdal::make_EPSG()[1:2]
dplyr::filter(crs_codes, code == 27700)## code note
## 1 27700 # OSGB 1936 / British National Grid
The result shows that the EPSG code 27700 represents the British National Grid, a result that could have been found by searching online for “EPSG 27700”. But what about the proj4string element? proj4strings are text strings in a particular format the describe the CRS. They can be seen as formulas for converting a projected point into a point on the surface of the Earth and can be accessed from crs objects as follows (see proj4.org for further details of what the output means):
st_crs(27700)$proj4string
#> [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 ...st_crs function, for example, st_crs(cycle_hire_osm).
The projection concepts described in the previous section apply equally to rasters. However, there are important differences in reprojection of vectors and rasters: transforming a vector object involves changing the coordinates of every vertex but this does not apply to raster data. Rasters are composed of rectangular cells of the same size (expressed by map units, such as degrees or meters), so it is impossible to transform coordinates of pixels separately. Raster reprojection involves creating a new raster object, often with a different number of columns and rows than the original. The attributes must subsequently be re-estimated, allowing the new pixels to be ‘filled’ with appropriate values. In other words, raster reprojection can be thought of as two separate spatial operations: a vector reprojection of cell centroids to another CRS (Section @ref(reproj-vec-geom)), and computation of new pixel values through resampling (Section @ref(aggregation-and-disaggregation)). Thus in most cases when both raster and vector data are used, it is better to avoid reprojecting rasters and reproject vectors instead.
The raster reprojection process is done with projectRaster() from the raster package. Like the st_transform() function demonstrated in the previous section, projectRaster() takes a geographic object (a raster dataset in this case) and a crs argument. However, projectRaster() only accepts the lengthy proj4string definitions of a CRS rather than concise EPSG codes.
proj4string definition with "+init=epsg:MY_NUMBER". For example, one can use the "+init=epsg:4326" definition to set CRS to WGS84 (EPSG code of 4326). The PROJ library automatically adds the rest of the parameters and converts them into "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0".
Let’s take a look at two examples of raster transformation: using categorical and continuous data. Land cover data are usually represented by categorical maps. The nlcd2011.tif file provides information for a small area in Utah, USA obtained from National Land Cover Database 2011 in the NAD83 / UTM zone 12N CRS.
cat_raster = raster(system.file("raster/nlcd2011.tif", package = "spDataLarge"))
crs(cat_raster)## CRS arguments:
## +proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs
In this region, 14 land cover classes were distinguished (a full list of NLCD2011 land cover classes can be found at mrlc.gov):
unique(cat_raster)## [1] 11 21 22 23 31 41 42 43 52 71 81 82 90 95
When reprojecting categorical rasters, the estimated values must be the same as those of the original. This could be done using the nearest neighbor method (ngb). This method sets each new cell value to the value of the nearest cell (center) of the input raster. An example is reprojecting cat_raster to WGS84, a geographic CRS well suited for web mapping. The first step is to obtain the PROJ definition of this CRS, which can be done using the http://spatialreference.org webpage. The final step is to reproject the raster with the projectRaster() function which, in the case of categorical data, uses the nearest neighbor method (ngb):
wgs84 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
cat_raster_wgs84 = projectRaster(cat_raster, crs = wgs84, method = "ngb")Many properties of the new object differ from the previous one, including the number of columns and rows (and therefore number of cells), resolution (transformed from meters into degrees), and extent, as illustrated in Table @ref(tab:catraster) (note that the number of categories increases from 14 to 15 because of the addition of NA values, not because a new category has been created — the land cover classes are preserved).
| CRS | nrow | ncol | ncell | resolution | unique_categories |
|---|---|---|---|---|---|
| NAD83 | 1359 | 1073 | 1458207 | 31.5275 | 14 |
| WGS84 | 1394 | 1111 | 1548734 | 0.0003 | 15 |
Reprojecting numeric rasters (with numeric or in this case integer values) follows an almost identical procedure. This is demonstrated below with srtm.tif in spDataLarge from the Shuttle Radar Topography Mission (SRTM), which represents height in meters above sea level (elevation) with the WGS84 CRS:
con_raster = raster(system.file("raster/srtm.tif", package = "spDataLarge"))
crs(con_raster)## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
We will reproject this dataset into a projected CRS, but not with the nearest neighbor method which is appropriate for categorical data. Instead, we will use the bilinear method which computes the output cell value based on the four nearest cells in the original raster. The values in the projected dataset are the distance-weighted average of the values from these four cells: the closer the input cell is to the center of the output cell, the greater its weight. The following commands create a text string representing the Oblique Lambert azimuthal equal-area projection, and reproject the raster into this CRS, using the bilinear method:
equalarea = "+proj=laea +lat_0=37.32 +lon_0=-113.04"
con_raster_ea = projectRaster(con_raster, crs = equalarea, method = "bilinear")
crs(con_raster_ea)## CRS arguments:
## +proj=laea +lat_0=37.32 +lon_0=-113.04 +ellps=WGS84
Raster reprojection on numeric variables also leads to small changes to values and spatial properties, such as the number of cells, resolution, and extent. These changes are demonstrated in Table @ref(tab:rastercrs)5:
| CRS | nrow | ncol | ncell | resolution | mean |
|---|---|---|---|---|---|
| WGS84 | 457 | 465 | 212505 | 0.0008 | 1842.548 |
| Equal-area | 467 | 478 | 223226 | 83.2000 | 1842.244 |
There is more to learn about CRSs. An excellent resource in this area, also implemented in R, is the website R Spatial. Chapter 6 of this free online book is recommended reading — see: rspatial.org/spatial/6-crs.html
The degree of compression is often referred to as flattening, defined in terms of the equatorial radius (\(a\)) and polar radius (\(b\)) as follows: \(f = (a - b) / a\). The terms ellipticity and compression can also be used [@maling_coordinate_1992]. Because \(f\) is a rather small value, digital ellipsoid models use the ‘inverse flattening’ (\(rf = 1/f\)) to define the Earth’s compression. Values of \(a\) and \(rf\) in various ellipsoidal models can be seen by executing st_proj_info(type = "ellps").↩
A complete list of the proj4string parameters can be found at https://proj4.org/.↩
For a short description of the most relevant projection parameters and related concepts, see the fourth lecture by Jochen Albrecht hosted at http://www.geography.hunter.cuny.edu/~jochen/GTECH361/lectures/ and information at https://proj4.org/parameters.html. Other great resources on projections are spatialreference.org and progonos.com/furuti/MapProj.↩
The difference in location between the two points is not due to imperfections in the transforming operation (which is in fact very accurate) but the low precision of the manually-created coordinates that created london and london_proj. Also surprising may be that the result is provided in a matrix with units of meters. This is because st_distance() can provide distances between many features and because the CRS has units of meters. Use as.numeric() to coerce the result into a regular number.↩
Another minor change, that is not represented in Table @ref(tab:rastercrs), is that the class of the values in the new projected raster dataset is numeric. This is because the bilinear method works with continuous data and the results are rarely coerced into whole integer values. This can have implications for file sizes when raster datasets are saved.↩